Defining and Exploring Chemical Spaces

نویسندگان

چکیده

Virtual libraries used in molecular discovery are often too large to exhaustively evaluate, warranting the use of algorithms help with exploration.Algorithmic approaches like Bayesian optimization can efficiently navigate predefined chemical spaces combination surrogate models.On-the-fly generation during exploration enables even larger be searched, including deep-learning-based models, although their defined only implicitly.Emerging incorporate reactions into machine-learning-based ensure that molecules able synthesized, similar previously developed for reaction-based de novo design. Designing functional desirable properties is a challenging, multi-objective optimization. For decades, there have been computational facilitate this process through simulation physical processes, prediction using structure–property relationships, and selection or structures. This review provides an overview some algorithmic defining exploring potential operationalize discovery. We emphasize roles machine learning consideration synthetic feasibility, which prerequisite ‘closing loop’. conclude by summarizing important directions future development evaluation these methods. Chemical space thought as set all possible materials. generally consider more narrowly constrained structures functions they contain. example, ‘drug-like space’ context drug reflect vast number those existing small-molecule therapeutics. While quantifying size rarely useful, it should noted far organic stable than atoms solar system, unsurprising given combinatorics designing graphs. Here, we focus our discussion on small rather periodic materials, biomolecules, polymers, correspond distinct ‘chemical spaces’. Many studies estimated different [1.Bohacek R.S. et al.The art practice structure-based design: modeling perspective.Med. Res. Rev. 1996; 16: 3-50Crossref PubMed Scopus (774) Google Scholar, 2.Drew K.L.M. al.Size estimation space: how big it?.J. Pharm. Pharmacol. 2012; 64: 490-495Crossref (31) 3.Polishchuk P.G. al.Estimation drug-like based GDB-17 data.J. Comput. Aided Mol. Des. 2013; 27: 675-679Crossref (201) Scholar] suggested rules organize along axes improve visualization navigability [4.Oprea T.I. Gottfries J. Chemography: navigating space.J. Comb. Chem. 2001; 3: 157-166Crossref (285) 5.Reymond J.-L. Awale M. Exploring Universe database.ACS Neurosci. 649-657Crossref (173) 6.Awale Reymond Web-based 3D-visualization DrugBank Cheminform. 2016; 8: 25Crossref (10) 7.Probst D. Visualization very high-dimensional data sets minimum spanning trees.J. 2020; 12: 12Crossref (65) Scholar]. As described previously, novel framed search within [8.Coley C.W. al.Autonomous sciences part I: progress.Angew. Int. Ed. 2019; (Published online September 25, 2019. https://doi.org/10.1002/anie.201909987)Google Scholar,9.Coley II: outlook.Angew. https://doi.org/10.1002/anie.201909989)Google The goal identify one exhibit properties. Besides strategy evaluate candidate molecules, two primary considerations must make are: (i) define space; (ii) explore space. Both contribute efficiency likelihood finding good candidate. These aspects not independent: if you repurposing FDA-approved drugs, your narrow enough exhaustive screen may feasible, but no such restriction employ select test. strategies typically iterative routines (driven human intuition driven quantitative experimental design) varying degrees sophistication, discussed later. Navigating has extensively written about (non-algorithmic) design [10.Dobson C.M. biology.Nature. 2004; 432: 824-828Crossref (717) Scholar,11.Lipinski C. Hopkins A. biology medicine.Nature. 855-861Crossref (769) exhaustively, so imposes constraints depending strategy, application, practical limitations cost time. look quite when candidates evaluated experiments. In former case, acquiring new information performance molecule requires its synthesis, purification, characterization; synthesis material availability paramount. latter postpone until after evaluations identified putative ‘optimal’ molecule. To bound cost, still restricted expertise ‘prior’ what would viable examines emphasis role synthesizability (Table 1, Key Table). performed subject-matter experts (e.g., medicinal chemists) absence computer assistance, formalizing concepts eventually enable autonomous workflows produce novel, useful outcomes reduced reliance subjectivity. Elements cover found previous articles, recent Lemonick [12.Lemonick S. AI take us where gone before?.Chem. Eng. News. 98: 30Google do address instead refer readers work coworkers [5.Reymond Scholar,7.Probst Scholar].Table 1Key Table. Categorization Approaches Define Spaces Molecular Discovery Incomplete Set Examples EachaSpaces prior fly evolutionary and/or learning-based They relatively unconstrained (i.e., terms validity) purchasability synthesizability).UnconstrainedConstrainedPredefinedZINC [13.Irwin J.J. al.ZINC: free tool discover chemistry biology.J. Inf. Model. 52: 1757-1768Crossref (1646) Scholar], ChEMBL [15.Gaulton al.ChEMBL: large-scale bioactivity database discovery.Nucleic Acids 40: D1100-D1107Crossref (2302) PubChem [14.Kim al.PubChem 2019 update: improved access data.Nucleic 47: D1102-D1109Crossref (1440) GDB [24.Reymond Space Project.Acc. 2015; 48: 722-730Crossref (266) Scholar]DrugBank [16.Wishart D.S. al.DrugBank: comprehensive resource silico exploration.Nucleic 2006; 34: D668-D672Crossref (2338) Enamine REAL (https://enamine.net/library-synthesis/real-compounds), WuXi Library (https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual), SAVI [32.Patel H. al.Synthetically Accessible Inventory (SAVI).ChemRxiv. April 27, 2020. https://doi.org/10.26434/chemrxiv.12185559)Google PGVL [33.Hu Q. al.LEAP Pfizer Global (PGVL) creation readily synthesizable ideas automatically.Methods Biol. 2011; 685: 253-276Crossref (29) PLC [34.Nicolaou C.A. Proximal Lilly Collection: mapping, exploiting feasible 56: 1253-1266Crossref (48) Scholar]On via heuristic methodsFragment-based GAs [57.Venkatasubramanian V. al.Computer-aided genetic algorithms.Comput. 1994; 18: 833-844Crossref (192) GroupBuild [66.Rotstein S.H. Murcko M.A. GroupBuild: fragment-based method design.J. Med. 1993; 36: 1700-1710Crossref (170) BREED [58.Pierce A.C. al.BREED: generating inhibitors hybridization known ligands. Application CDK2, P38, HIV protease.J. 2768-2775Crossref (149) GraphGA [62.Jensen J.H. A graph-based algorithm generative model/Monte Carlo tree space.Chem. Sci. 10: 3567-3572Crossref GEGL [63.Ahn al.Guiding deep exploration.arXiv. July 4, http://arxiv.org/abs/2007.04897)Google Scholar]SYNOPSIS [91.Vinkers H.M. al.SYNOPSIS: SYNthesize OPtimize System Silico.J. 2003; 46: 2765-2773Crossref (163) Flux [88.Fechner U. Schneider G. (1): virtual scheme 699-707Crossref (83) MOARF [89.Firth N.C. al.MOARF, integrated workflow optimization: implementation, biological evaluation.J. 55: 1169-1180Crossref (24) DOGS [92.Hartenfeller al.DOGS: reaction-driven bioactive compounds.PLoS 8e1002380Crossref (155) learningSMILES VAE [118.Gomez-Bombarelli R. al.Automatic data-driven continuous representation molecules.ACS Cent. 2018; 4: 268-276Crossref (1022) JT-VAE [75.Jin W. al.Junction variational autoencoder graph generation.arXiv. February 12, 2018. https://arxiv.org/abs/1802.04364)Google SMILES RNN [72.Segler M.H.S. al.Generating focused recurrent neural networks.ACS 120-131Crossref (514) Scholar,73.Olivecrona al.Molecular de-novo reinforcement learning.J. 2017; 9: 48Crossref (381) MolDQN [77.Zhou Z. al.Optimization learning.arXiv. October 19, http://arxiv.org/abs/1810.08678)Google Scholar]MoleculeChef [96.Bradshaw al.A model molecules.arXiv. June http://arxiv.org/abs/1906.05221)Google ChemBO [97.Korovina K. ChemBO: recommendations.arXiv. August 5, http://arxiv.org/abs/1908.01425)Google PGFS [98.Gottipati S.K. al.Learning synthetically accessible 26, https://arxiv.org/abs/2004.12485v1)Google REACTOR [99.Horwood Noutahi E. 29, https://arxiv.org/abs/2004.14308v1)Google Scholar]a synthesizability). Open table tab One approach enumerated list molecules. setting, stages entirely decoupled. Formally, might think problem objective function f(x), x belonging discrete X. Defining selecting finite relies domain expertise. Careful X increase contains high-performing while minimizing low-performing compounds. Common databases screening ZINC library commercially available compounds; relevance; data; approved therapeutic (see Glossary) represent ‘general-purpose’ broad relevance therefore applied many problems related [17.Walters W.P. libraries.J. 62: 1116-1124Crossref More created domain-informed enumeration compounds relevant specific application; 1.6 million donor-bridge-acceptor trimers electronics [18.Gomez-Bombarelli al.Design efficient light-emitting diodes high-throughput approach.Nat. Mater. 15: 1120-1127Crossref (509) 2.8 transition-metal complexes redox flow batteries [19.Janet J.P. al.Accurate multiobjective millions transition metal neural-network-driven global optimization.ACS 6: 513-524Crossref (60) strict fragments included attached, R-group Privileged retrosynthetic analysis automatic fragmentation [20.Lewell X.Q. al.RECAP – combinatorial procedure: powerful technique identifying privileged applications chemistry.J. Inform. 1998; 38: 511-522Crossref (534) Scholar,21.Ertl P. Cheminformatics substituents: identification most common substituents, calculation substituent properties, bioisosteric groups.J. 43: 374-380Crossref (219) Scholar]; produced recombining intended promising structure alone. Graph-theoretical studied over century, starting simple acyclic alkanes [22.Cayley Ueber die analytischen Figuren, welche der Mathematik Bäume genannt werden und ihre Anwendung auf Theorie chemischer Verbindungen.Ber. Dtsch. Ges. 1875; 8 (in German): 1056-1059Crossref (58) Scholar,23.Henze H.R. Blair isomeric hydrocarbons methane series.J. Am. Soc. 1931; 53: 3077-3085Crossref (77) However, recently recorded, evaluated, Project exemplifies modern containing atom types up certain Since original Generated DataBase (GDB) seven heavy [25.Fink T. universe 11 C, N, O, F: assembly 26.4 (110.9 stereoisomers) ring systems, stereochemistry, physicochemical compound classes, discovery.J. 2007; 342-353PubMed enumerated, analyzed, released 166.4 billion 17 [26.Ruddigkeit L. al.Enumeration 166 GDB-17.J. 2864-2875Crossref (569) published numerous visualizations analyses thereof. addition benefits ensuring objective, predefinition lets impose arbitrary contents. constraint ease validation: any physically acquired testing. simplest could company’s inventory vendor catalog. Any from rapidly evaluation. Accessibility motivation make-on-demand libraries, stock straightforward protocols. Libraries applying (<100) reaction templates single-step transformations combinations materials [27.Cramer R.D. al.Virtual libraries: decision making research.J. 1010-1023Crossref (80) 28.Nikitin diversity programs.J. 2005; 19: 47-63Crossref 29.Cramer al.AllChem: searching 1020 structures.J. 21: 341-350Crossref (44) 30.Patel al.Knowledge-based vectors.J. 2009; 49: 1163-1184Crossref (61) (Figure 1); recursive generates multiple steps. There implementations [31.Hoffmann Gastreich next level navigation: going beyond enumerable libraries.Drug Discov. Today. 24: 1148-1156Crossref (82) efforts pharmaceutical companies Scholar,34.Nicolaou commercial vendors (https://enamine.net/library-synthesis/real-compounds; https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual). becomes impractical store numbers due explosion products, implicitly. Whether easy synthesize depends robustness enumeration. Lyu colleagues cite 86% success rate 51 selected 170 130 types; estimates 60–80% 1.7-billion-member collection generated 30 (https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual). machine-learning models outcome [35.Coley graph-convolutional network reactivity.Chem. 370-377Crossref Scholar,36.Schwaller Transformer: uncertainty-calibrated prediction.ACS 5: 1572-1583Crossref (190) accuracies above 90% benchmark datasets. directly enumerate products predict regio/stereoselectivity patterns [37.Tomberg predictive electrophilic aromatic substitutions Org. 84: 4695-4703Crossref (38) 38.Beker al.Prediction major regio-, site-, diastereoisomers Diels–Alder machine-learning: importance meaningful descriptors.Angew. 58: 4515-4519Crossref (63) 39.Struble T.J. al.Multitask site selectivity C–H functionalization reactions.React. 896-902Crossref Once defined, several top-performing them. is, course, every feasibility nature time/cost constraints. It test database, smaller collections Drug Repurposing Hub [40.Corsello S.M. Hub: next-generation resource.Nat. 23: 405-408Crossref (352) NCATS Pharmaceutical Collection [41.Huang 10-year update.Drug 2341-2349Crossref (25) worth noting technologies DNA-encoded [42.Clark al.Design, libraries.Nat. 647-654Crossref (416) phage display [43.Smith G.P. Petrenko V.A. Phage display.Chem. 1997; 97: 391-410Crossref (1352) trillions albeit sparse stochastic readout. If computational, practicality simply question budget. largest docking reported date, 138 99 were docked against D4 receptor AmpC, respectively [44.Lyu al.Ultra discovering chemotypes.Nature. 566: 224-229Crossref (297) since screened 1 same [45.Gorgulla al.An open-source platform ultra-large screens.Nature. 580: 663-668Crossref Scholar,46.Acharya al.Supercomputer-based ensemble pipeline application Covid-19.ChemRxiv. https://doi.org/10.26434/chemrxiv.12725465.v1)PubMed exceed scale orders magnitude, argue techniques long-term inexpensive docking. popular framework reduce overall active iterative, model-guided [47.Settles B. Active learning.Synth. Lect. Artif. Intell. Mach. Learn. 1-114Crossref (625) involves subsets experiments perform predictions relationship (QSPR) model: f^(x) codifies approximation f(x). optimization, uncertainty both considered balance uncertain exploitation likely high performing [48.Frazier P.I. tutorial optimization.arXiv. 8, https://arxiv.org/abs/1807.02811v1)Google simpler schemes greedy search. paradigm include Eve [49.Williams al.Cheaper faster validated repositioning drugs neglected tropical diseases.J. Interface. 20141289Crossref (59) retrospective [50.Kangas J.D. al.Efficient responses proteins learning.BMC Bioinformatics. 2014; 143Crossref (23) OLED-relevant [51.Gentile F. al.Deep Docking: augmentation discovery.ACS 939-949Crossref addressed model, f^, low-data performance, generalization power, ability quantify [52.Muratov E.N. al.QSAR without borders.Chem. 3525-3564Crossref methods graph-structured [53.Wu survey networks.IEEE Trans. Neural Netw. Syst. March 24, https://doi.org/10.1109/TNNLS.2020.2978386)Crossref (951) Algorithmic improvements better handle variable costs purchasing compound) batched parallelized well plates CPUs) beneficial. iterations lead one-iteration effective. antibiotic was fewer way [54.Stokes J.M. discovery.Cell. 180: 688-702.e13Abstract Full Text PDF

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Defining features and exploring chemical modifications to manipulate RNAa activity.

RNA interference (RNAi) is an evolutionary conserved mechanism by which small double-stranded RNA (dsRNA)--termed small interfering RNA (siRNA)--inhibit translation or degrade complementary mRNA sequences. Identifying features and enzymatic components of the RNAi pathway have led to the design of highly-effective siRNA molecules for laboratory and therapeutic application. RNA activation (RNAa) ...

متن کامل

compactifications and function spaces on weighted semigruops

chapter one is devoted to a moderate discussion on preliminaries, according to our requirements. chapter two which is based on our work in (24) is devoted introducting weighted semigroups (s, w), and studying some famous function spaces on them, especially the relations between go (s, w) and other function speces are invesigated. in fact this chapter is a complement to (32). one of the main fea...

15 صفحه اول

wavelets, modulation spaces and pseudidifferential operators

مبحث تحلیل زمان-فرکانسی سیگنالها یکی از مهمترین زمینه های مورد بررسی پژوهشگران علوم ÷ایه کاربردی و فنی مهندسی میباشد.در این پایان نامه فضاهای مدولاسیون به عنوان زمینه اصلی این بررسی ها معرفی گردیده اند و نتایج جدیدی که در حوزه های مختلف ریاضی،فیزیک و مهندسی کاربرداساسی و فراوانی دارند استوار و بیان شده اند.به ویژه در این پایان نامه به بررسی و یافتن مقادیر ویژه عملگر های شبه دیفرانسیل با سمبل در...

Defining and exploring properties in diagnostic systems

Every model-based diagnostic approach relies on a representation of a real-world system, in this paper called believed system. The believed system is used along with the observations about the realworld system to generate a diagnostic problem to be solved. In this paper it is firstly argued that believed systems can differ from real-world systems in many different manners. As so, properties of ...

متن کامل

Exploring Growing Information Spaces

Growing information spaces bring not only informational challenges, but also new opportunities for exploration. Drawing from maps as an analogy, we conceptualize information seekers as explorers of expanding information landscapes. While text has been the dominant frame of reference for information-seeking interfaces, visualization has the potential to expose faceted overviews and provide many ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Trends in chemistry

سال: 2021

ISSN: ['2589-5974', '2589-7209']

DOI: https://doi.org/10.1016/j.trechm.2020.11.004